The rapidly evolving cybersecurity landscape de¬mands intelligent, multi-modal, and explainable threat detection systems. With over 3.5 billion Android devices worldwide and millions of new malware samples appearing every year, tradi¬tional signature-based tools have become increasingly ineffective. Moreover, modern cyber threats do not arrive through a single channel — attackers exploit Android applications, malicious documents, phishing URLs, and harmful content simultaneously, yet most available tools address only one of these vectors at a time. This project presents AEGIS 2026 (Adaptive Engine for General Intelligence Security), a production-ready, AI-powered multi-layer cybersecurity platform engineered to detect threats across four distinct modalities: Android APK malware, malicious documents (PDF, DOCX, XLSX), phishing and malicious URLs, and harmful text content. The system is built on a soft-voting ensemble of XGBoost and Random Forest classifiers trained on 54+ static features extracted using the androguard library, achieving a test accuracy of 96.3 The platform is deployed as a secure Flask REST API exposing 14 endpoints, with engineering controls including rate limiting (50 requests/hour/IP), werkzeug secure file handling, SHA-256-keyed LRU caching, SQLite persistence for scan history, and automated forensic PDF report generation via ReportLab. A browser-based Single Page Application (SPA) dashboard with Chart.js visualisations and voice alert capability completes the user-facing interface. AEGIS addresses five documented limitations of existing tools signature dependency, single-modality coverage, black-box AI decisions, absence of production engineering, and string-match false positives — and outperforms all compared baseline systems across every measured metric.
Introduction
AEGIS 2026 is a final-year B.Tech project that develops a complete, production-ready, AI-driven cybersecurity platform. It addresses a major gap in existing tools by providing a unified, open-source system capable of analyzing multiple threat types—Android apps, documents, URLs, and text—within a single framework while offering explainable AI results.
The system consists of four main detection engines: an APK malware scanner, a document threat analyzer, a phishing URL detector, and a content moderation engine. These components work together through a unified REST API, web dashboard, and automated report generation system.
AEGIS overcomes key limitations of traditional cybersecurity tools, such as reliance on signature-based detection, inability to detect zero-day or polymorphic malware, lack of integration across threat types, and absence of explainability. It uses machine learning models (XGBoost and Random Forest) along with SHAP and LLM-based explanations to provide both accurate and transparent results.
The platform is designed with a scalable four-tier architecture, including a frontend dashboard, backend API, intelligence layer, and external integrations. It is modular, cost-effective, and deployable, requiring minimal resources.
Compared to existing tools like VirusTotal or ClamAV, AEGIS offers multi-modal analysis, real-time detection, lower false positives (via call-graph analysis), and detailed, user-friendly explanations.
Performance evaluation shows high accuracy, precision, recall, and efficiency, demonstrating that AEGIS is both technically robust and practical. Overall, it represents an advanced, integrated approach to modern cybersecurity, combining machine learning, system design, and usability.
Conclusion
This project has successfully designed, implemented, and evaluated AEGIS 2026 — Adaptive Engine for General In¬telligence Security — a production-ready, AI-powered multi-layer cybersecurity platform. The system addresses five well-documented limitations of existing security tools: signature dependency on zero-day-vulnerable blacklists; single-modality coverage that forces analysts to switch between multiple disparate tools; black-box AI decisions that cannot be acted on with confidence; the absence of production engineering features in academic prototypes; and the excessive false posi¬tive rates generated by string-based API detection in Android malware scanners.
The technical achievements of AEGIS can be summarised across three dimensions:
The soft-voting XGBoost + Random Forest ensemble achieves a test accuracy of 96.3 percent, an F1-score of 0.955, and a ROC-AUC of 0.979 on a held-out test set. Five¬fold stratified cross-validation confirms performance stability with a mean AUC of 0.977 0.003. The call-graph-based API detection mechanism reduces the false positive rate from 18.7percent(string-matching baseline) to 2.3 percent — a relative improvement of 87.7 Percent. AEGIS is not merely a college project. It is a production-ready foundation for a real-world cybersecurity product — one that outperforms every comparable open-source tool on the di-mensions that matter most: detection accuracy, explainability, false positive rate, and operational readiness. The current AEGIS implementation relies entirely on static analysis — it does not execute the APK. Malware that downloads its payload at runtime (DexClassLoader + remote server) or uses obfuscated reflection to call dangerous APIs at execution time will evade static detection. Integrating Cuck-ooDroid — an Android-specific sandbox environment — as a second analysis phase would execute the APK in a monitored virtual device and capture system call sequences, network communications, and file system changes. These dynamic features would be encoded as a complementary feature vector and fed to a second-stage classifier.
References
[1] Statista Research Department, ”Number of Android smartphone users worldwide from 2013 to 2025,” Statista, Jan. 2025. [Online]. Available: https://www.statista.com
[2] M. Egele, T. Scholte, E. Kirda, and C. Kruegel, ”A survey on automated dynamic malware-analysis techniques and tools,” ACM Computing Surveys, vol. 44, no. 2, pp. 1–42, Feb. 2012.
[3] National Institute of Standards and Technology, ”Artificial Intelligence Risk Management Framework (AI RMF 1.0),” NIST AI 100-1, U.S. Dept. of Commerce, Jan. 2023.
[4] T. Chen and C. Guestrin, ”XGBoost: A scalable tree boosting system,” in Proc. 22nd ACM SIGKDD, San Francisco, CA, 2016, pp. 785–794
[5] L. Breiman, ”Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, Oct. 2001.
[6] S. M. Lundberg and S.-I. Lee, ”A unified approach to interpreting model predictions,” in Proc. NeurIPS 2017, Long Beach, CA, pp. 4765–4774.
[7] A. Desnos et al., ”androguard: Reverse engineering of Android applications,” GitHub, 2020. [Online]. Available:
https://github.com/androguard/androguard
[8] VirusTotal, ”VirusTotal API v3 Documentation,” Google LLC, 2024. [Online]. Available: https://developers.virustotal.com
[9] ] D. Arp, M. Spreitzenbarth, M. Huebner, H. Gascon, and K. Rieck, ”DREBIN: Effective and explainable detection of Android malware in your pocket,” in Proc. NDSS 2014, San Diego, CA.
[10] K. Grosse, N. Papernot, P. Manoharan, M. Backes, and P. McDaniel, ”Adversarial examples for malware detection,” in Proc. ESORICS 2017, Oslo, Norway, pp. 62–79.
[11] E. Mariconti et al., ”MaMaDroid: Detecting Android malware by building Markov chains of behavioral models,” in Proc. NDSS 2017, San Diego, CA.
[12] Z. Yuan, Y. Lu, Z. Wang, and Y. Xue, ”Droid-Sec: Deep learning in Android malware detection,” in Proc. ACM SIGCOMM 2014 Workshop, Chicago, IL.
[13] X. Hou, L. Gao, Z. Li, and Z. Peng, ”Android malware detection with graph convolutional networks,” in Proc. IEEE ICC 2019, Shanghai, China.
[14] D. Maiorca, I. Corona, and G. Giacinto, ”Looking at the bag is not enough to find the bomb,” in Proc. ASIA CCS 2013, Hangzhou, China,
[15] pp. 119–130.
[16] N. S?rndic´ and P. Laskov, ”Practical evasion of a learning-based classifier: A case study,” in Proc. IEEE S P 2014, San Jose, CA, pp. 197–211.
[17] A. K. Jain and B. B. Gupta, ”A novel approach to protect against phishing attacks at client side,” EURASIP J. Inf. Secur., vol. 2016, no. 1, 2016.
[18] J. Ma, L. K. Saul, S. Savage, and G. M. Voelker, ”Beyond blacklists: Learning to detect malicious web sites from suspicious URLs,” in Proc. KDD 2009, Paris, France.
[19] O. K. Sahingoz, E. Buber, O. Demir, and B. Diri, ”Machine learning based phishing detection from URLs,” Expert Systems with Applica¬tions, vol. 117, pp. 345–357, 2019.
[20] T. Davidson, D. Warmsley, M. Macy, and I. Weber, ”Automated hate speech detection and the problem of offensive language,” in Proc. ICWSM 2017, Montreal.
[21] A. Warnecke, D. Arp, C. Wressnegger, and K. Rieck, ”Evaluating explanation methods for deep learning in security,” in Proc. Euro S P 2020, Genoa, Italy.
[22] K. Allix, T. F. Bissyande´, J. Klein, and Y. Le Traon, ”AndroZoo: Collecting millions of Android apps for the research community,” in Proc. MSR 2016, Austin, TX.
[23] Meta AI, ”LLaMA 3: Open Foundation Language Models,” Meta AI Technical Report, Apr. 2024. [Online]. Available: https://llama.meta.com
[24] Ollama Project, ”Ollama: Local LLM inference server,” 2024. [Online]. Available: https://ollama.ai
[25] ReportLab Inc., ”ReportLab PDF Library User Guide,” Version 4.1, 2024. [Online]. Available: https://www.reportlab.com
[26] Flask Documentation, ”Flask: Web development one drop at a time,” Pal¬lets Project, 2024. [Online]. Available: https://flask.palletsprojects.com